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in Determination of Safe Harbor Eligibility 
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University of Arkansas - Fayetteville 

Abstract 

As part of No Child Left Behind (NCLB) legislation, many states are using 
confidence intervals to determine a range of scores for evaluating a school 
system. More specifically, the states are employing confidence intervals to help 
minimize measurement error in determining a school system s performance. 
The methodology and techniques employed in these NCLB calculations 
for confidence intervals have raised several questions with regard to 
appropriateness, methods, and the transfer to educational policy. The purpose 
of this paper is to review the methodology, application, and impact of the 
various methods in regard to educational policy. Additionally, simulations 
that examine variations in sample size and proportions were completed in 
order to examine how inconsistency can impact the determination of a school ’s 
performance relative to the achievement goals. 

Background 

The No Child Left Behind (NCLB) legislation, implemented in 2002, 
mandated that sehools and distriets be evaluated relative to state performanee 
standards. Further, their performanee is assigned a “grade” or designation of 
“Meets Standard,” “Alert,” or “Sehool Improvement.” A sehool is assigned 
the designation of “Meets Standard” if overall student performanee on 
aehievement tests attain the designated eriteria established by the individual 
state. A sehool is assigned the status of “Alert” if it fails to meet designated 
performanee standards for the eurrent year, but has attained the status 
“Meets Standard” in the previous year. If a sehool fails to meet designated 
performanee standards for two eonseeutive years, the sehool is plaeed in 
“Sehool Improvement” and by statute in NCLB, parents have to be provided 
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the opportunity to transfer their ehild to alternative sehools whieh have met 
the performanee standard. Additionally, sehools ean be required to provide 
tutoring or other student support meehanisms whieh translates into inereased 
finaneial eosts for distriets. 

The assignment of a sehool to “Sehool Improvement” ean be avoided 
through a “Safe Harbor” provision within NCLB (NCLB, 2002). Safe Harbor 
is a flexible provision within NCLB regulations that allows for eonsideration 
of a sehool system’s aeademie improvement during the most reeent year or 
other time period as deemed appropriate. A sehool ean be deemed as “Meets 
Standard” by exeeeding the annual performanee goals or by improving 
performanee by a predetermined amount. If Sehool A makes “adequate yearly 
progress,” the standard for growth during the assigned time period, it is deemed 
as “Meets Standard” for performanee eonsistent with NCLB legislation. 

For example. Safe Harbor in Arkansas is attained if a sehool met 
attendanee, pereentage of students tested, and a 10% growth in aehievement 
standards during the eurrent year. The attendanee rate and pereent tested eriteria 
are statie at 91.13% and 95.0%, respeetively. The 10% growth, however, is 
based on eaeh sehooTs previous year’s performanee. The amount of expeeted 
growth is very simple and straightforward to eompute. For example: a sehool 
had 20% of students profieient on the aehievement test last year, and this year 
must inerease the pereent of students profieient on the exam by 10% of the 
differenee between 20% and 100% (i.e. 100 - 20 divided by 10). Thus, the 
performanee growth goal for Safe Harbor determination for this sehool is 
8 %. 

It is also aeknowledged that a eertain amount of measurement error will 
exist in this proeess, so to provide a best ease seenario for sehools, the use of 
eonfidenee intervals has been proposed to develop lower versus upper bound 
values in the system. Numerous statistieal issues have been raised in regard to 
the development and implementation of eonfidenee intervals in this proeess, 
from inaeeurate determination, diserepaney in sample sizes, and one-tail or 
two-tailed intervals. 

Example of a School System Appeal 

Suppose Sehool System A has appealed the designation of the performanee 
eategory of “Sehool Improvement” and applied for “Safe Harbor” in part based 
on the ealeulation of eonfidenee intervals by the Arkansas Department of 
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Education. As stated in the Arkansas Consolidated State Aeeountability Plan, 
to invoke the “Safe Harbor” provision sehools must meet three eonditions: 
(a) they must have tested 95% of their students; (b) if a sehool does not have 
a high sehool graduation rate, they must meet the 91.13% attendanee rate; if 
a sehool does have high sehool graduation, they must attain a graduation rate 
of 73.9%; and (e) they must have a 10% reduetion in the differenee between 
last year’s performanee and the attainment of 100% of students profieient 
on the aehievement test as deseribed above from 20 to 28%-this is referred 
to as a 10% growth, but should not be eonfused with a 10% improvement 
from last year’s performanee, or in the example above, from 20% to 22%. 
Additionally, Sehool System A has raised the issue of performanee against the 
state standards for Literaey and Mathematies (see Table 1 on next page). Table 
1 provides the performanee goals for sehools and the lower bound values. A 
sehool is expeeted to meet the performanee goal, but if the sehool meets the 
lower bound value for the eonfidenee interval it is eonsidered to have met 
performanee standards for the aeademie year. 

Methodology 

Computation of Confidence Intervals 

Various methods ean be used for eomputing eonfidenee intervals. A 
eomparison of the differenees in the two most widely used methods will be 
ineluded in this review for the Sehool System A appeal. The first method is the 
traditional method for eomputing eonfidenee intervals for proportions (Glass 
& Hopkins, 1996). The seeond method is the Ghosh method (1979), whieh 
addresses distributional and sample size issues whieh ean be problematie in 
the more traditional method. 

Method 1: The Traditional Method 

The eonfidenee intervals were eomputed using standard statistieal 
methodology for eomputing these ranges (Glass & Hopkins, 1996). Both, two- 
sided and one-sided eonfidenee intervals were eomputed using a 75% eonfidenee 
band. In layman’s terms, this means that over repeated samples, one would expeet 
the “true”pereentages of students for a sehool would reside in 75% of the intervals. 
A 75% eonfidenee interval was employed in lieu of the more traditional 68% 
or 95% intervals due to language used in the approval of a statewide sehool 
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Table 1 

Literacy and Mathematics Performance and Standards for 2003 
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improvement plan. Originally, if a school 
met 75% of their performance goal 
(i.e., if the goal was 10%o growth and 
the school obtained 7.5% or greater) 
they were consider to have “MET” their 
growth expectations. Given the language 
submitted and approved by the U.S. 
Department of Education, 75% confidence 
intervals were employed. 

One-Tail versus Two-Tailed Confidence 
Intervals 

A two-tailed confidence interval 
equally divides the 75% confidence interval 
around a school’s obtained percentage 
of students proficient. Thus, for a 75% 
confidence interval, 37.5% of this band 
is below the obtained score and 37.5% 
is above their score. Using the normal 
approximation, a z-value is obtained to 
identify 37.5% of the area from the center 
of a standard normal curve, in this case z ± 
1.15, and is used to multiply the standard 
error and create the confidence interval as 
demonstrated in the provided example. 

If the hypothesis or direction of a 
percentage is known a priori you can also 
calculate a one-tailed confidence interval 
using the 75% criteria. Using this method, 
75% of the distribution is identified as 
resting below or above an identified value, 
predicated on the directional hypothesis 
for performance. The z-value for the 
standard normal table is identified, in this 
case z = .674, and is used to compute the 
confidence interval (See Tables 2 - 3 on 
the following pages). 
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Table 2 

Two-Tailed 75% Confidence Intervals Using Both Traditional and Ghosh Methods for Literacy and Mathematics for Selected School 
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Note: Prolit = pereent proficient in literatcy; Promath = percent proficient in mathematics. 


Note: Prolit = percent proficient in literacy; Promath = percent proficient in mathematics. 
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Table 3 

One-Tailed 75% Confidence Intervals Using Both Traditional and Ghosh Methods for Literacy and Mathematics for Selected School 
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Equation for both One-Tailed and Two-Tailed Confidence Intervals 

The confidence intervals were calculated using the following formula: 


and a z- value of 1.15, “p” is the percent of students proficient, and N is the 
sample size. For the one-tail confidence intervals a z- value of .674 is used and 
the C.I. has the form p + (z-value)a^^. 

Example of Use of Confidence Intervals 

Using the School Total data for School A from Table 2, School A 
obtained the scores of 19.14 percent proficient for 2002 Literacy and 28.07 
percent for 2003 Literacy, with an improvement of 8.93 percent. The goal 
for growth was (100 - 19.14)/10 = 8.086 or the target for 2003 was 19.14 + 
8.086 = 27.23 percent of students proficient. The elementary school met this 
goal with 28.07 percent of their students proficient. The upper bounds for the 
confidence intervals are: 

one-tailed: 28.07 + (.674)(2.16) = 29.53 
two-tailed: 28.07 + (1.15)(2.16) = 30.55 
The values demonstrate the inherent value and added statistical power issues 
associated with using one-tailed versus two-tailed confidence intervals. In 
practice, smaller confidence intervals are desirable. A common use of these 
intervals is in hypothesis testing with a distributional hypothesized value, 
determining statistical significance identified. Failure to have this hypothesized 
value within the confidence interval indicates a “statistically significant” 
difference between the obtained value and the hypothesized value. Typically, 
a difference of this magnitude is important for researchers. The goal is to be 
75% confident that the true proportion is less than or equal to 29.53 and 75% 
confident that the true proportion is between 25.59 and 30.55. In practice, 
however, if the school had a performance goal of 30% of students proficient 
it would have been judged to “meet” this standard using the two-tailed 
interval provided a margin for error in consideration of the school’s “true” 
performance. In the context of attempting to obtain an easier standard to assess 
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if a school was within a 75% confidence interval in meeting the performanee 
growth, it aetually ereates a more rigid standard with 30.55 “wider” or a 
greater upper bound obtained for the two-tailed ease over the 29.53 used for 
a one-tailed eonfidenee interval. 


Method 2: The Ghosh Method 

The Ghosh method uses the binomial distribution, in eontrast to the 
normal distribution, and has been demonstrated to be more aeeurate over other 
proeedures (Ghosh, 1979; Glass & Hopkins, 1996). Next, the Ghosh method 
and equations will be introdueed and applied to the same eonditions as the 
traditional method for eomputing eonfidenee intervals. 


Equations for Two-Tailed Confidence Intervals 
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Thus, one ean be 75% eonfident that the value of n is with the range [n^, 
Tiy]. The aetual probability that n is within any speeifie interval is either 0 or 

1. 
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Equation for One-Tailed Confidence Intervals 

A one-tailed eonfidenee interval that sets an upper bound, i.e., we are 
75% eonfident that n is less than or equal to is as follows: 


% - 


n 


n-hz^ 


/V /V 


P + — + Z 

2n 




n 


4n 


For the 75% one-tail C.I. for n less than or equal to we have: 


n 




n + (.674) 


' , \p-f {.61 Af 

p + — + (.674) • J + 


2n 


n 


An 


n 


W+.4543 


^ .2271 p-q .1136 

p + + (.674)- + 


n 


n 


n 


Comparison of the Results of the Two Methods 

Effeetively, the two methods produee similar results for larger sample 
sizes (see Table 4 on the next page). However, given the NCLB issues 
assoeiated with smaller sample sizes, and the ineredible diserepant nature of 
sehool sizes in rural states sueh as Arkansas, the Ghosh method appears to 
be more equitable. For example, if you adjust the sample size to represent a 
very small sehool of 40 students versus a very large sehool of 500 students, 
the values for the Ghosh eonfidenee interval are slightly smaller. You ean also 
see that as the value of ti: deviates from .5, the Ghosh method makes a greater 
adjustment from the traditional method. In all eases, the deeision using the 
traditional or Ghosh methods are the same (see Tables 2 and 3), but the Ghosh 
method provides additional proteetion against sample size variation and is 
reeognized as the more effieient method. 

Importance for Education 

The implementation of NCLB legislation has ereated many interesting 
measurement and statistieal questions. Further, given the bipartisan support 
in Congress for NCLB, it is unlikely there will be any signifieant ehanges 
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Table 4 

Comparing Three Samples Size for Ghosh Method 


45 


Sample Size 
40 


100 


500 


Proportion 

.1 

.2 

.3 

.4 

.5 

.6 

.7 

.8 

.9 

.1 

.2 

.3 

.4 

.5 

.6 

.7 

.8 

.9 

.1 

.2 

.3 

.4 

.5 

.6 

.7 

.8 

.9 


Traditional 
LB UB 
.045 .155 
.127 .273 
.217 .383 
.311 .489 
.409 .591 
.511 .689 
.617 .783 
.727 .873 
.845 .955 

.065 .135 
.154 .246 
.247 .353 
.344 .456 
.443 .558 
.544 .656 
.647 .753 
.754 .846 
.866 .935 

.085 .115 
.179 .221 
.276 .324 
.375 .425 
.474 .526 
.575 .625 
.676 .724 
.779 .821 
.885 .915 


Ghosh 


LB 

UB 

.058 

.168 

.137 

.281 

.224 

.389 

.316 

.491 

.411 

.589 

.509 

.685 

.611 

.776 

.718 

.863 

.832 

.942 

.071 

.140 

.158 

.250 

.250 

.355 

.345 

.457 

.443 

.557 

.543 

.655 

.645 

.750 

.750 

.842 

.860 

.929 

.086 

.117 

.180 

.221 

.277 

.324 

.375 

.425 

.474 

.526 

.575 

.625 

.676 

.723 

.779 

.820 

.884 

.914 


Note; LB = lower bound; UB = upper bound. 


in this legislation until the reauthorization of the Elementary and Seeondary 
Edueation Aet (ESEA). Given that the eonsequenees ofNCEB legislation are 
very real for sehools, edueators, and students, it is paramount that groups sueh 
as edueational statistieians eomplete studies and provide insight on methods 
and assessment praetiees that are appropriate. 
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Recommendations for NCLB and Implications for State Educational 
Agencies 

A reality of NCLB was that the operationalizing of this legislation, aside 
from some very broad guidelines, was left to SEAs. The use of eonfidenee 
intervals is understood and appreeiated, but some speeifie reeommendations 
for NCLB inelude: 

1) Allow states to adjust seores using one standard deviation with the 
standard error of measurement. If the upper limit of a student’s interval ineludes 
the “passing seore,” they ean report the student as provisionally passing. This 
would indieate the student’s seore was below the “passing seore” but within 
measurement error. 

2) If use of eonfidenee intervals at the sehool level is eontinued, it is 
reeommended the Ghosh method be employed. Additionally, it is reeommended 
the width of the intervals be limited to 68%. Given the large pereentage of 
students tested from the sehool’s “population,” it is expeeted there will be 
limited measurement error in the “true” seore for the sehool system. 

From a poliey perspeetive, it is important that SEAs embraee the intent 
of NCLB to measure the performanee of sehool systems and ensure that all 
students are reeeiving aeeess to a quality edueation. The use of statistieal 
approaehes that are only positively biased, sueh as how eonfidenee intervals 
have been applied, represents only one area where the polieies of NCLB have 
been ineonsistent with sound mathematieal and statistieal methodologies. 
Given the aetual NCLB legislation ineluded the term “seientifieally based” over 
100 times, it seems reasonable to expeet, say demand, the measurement and 
statistieal models employed to evaluate sehool systems be held to a standard 
that is beyond what is politieally expedient or legally wit hin the eonfines 
of the law. The use of suspeet use of statistieal applieations tends to detraet 
from the otherwise laudable efforts to improve the K-12 system nationally via 
the implementation of NCLB. The intent of this researeh is to help provide 
improved assessment of sehool performanee, whieh should be the initial step 
in any reform model. 
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